A Kernel Theory of Modern Data Augmentation
نویسندگان
چکیده
Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding modern data augmentation techniques. We start by showing that for kernel classifiers, data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. We connect this general approximation framework to prior work in invariant kernels, tangent propagation, and robust optimization. Next, we explicitly tackle the compositional aspect of modern data augmentation techniques, proposing a novel model of data augmentation as a Markov process. Under this model, we show that performing k-nearest neighbors with data augmentation is asymptotically equivalent to a kernel classifier. Finally, we illustrate ways in which our theoretical framework can be leveraged to accelerate machine learning workflows in practice, including reducing the amount of computation needed to train on augmented data, and predicting the utility of a transformation prior to training.
منابع مشابه
Fisher’s Linear Discriminant Analysis for Weather Data by reproducing kernel Hilbert spaces framework
Recently with science and technology development, data with functional nature are easy to collect. Hence, statistical analysis of such data is of great importance. Similar to multivariate analysis, linear combinations of random variables have a key role in functional analysis. The role of Theory of Reproducing Kernel Hilbert Spaces is very important in this content. In this paper we study a gen...
متن کاملFrobenius kernel and Wedderburn's little theorem
We give a new proof of the well known Wedderburn's little theorem (1905) that a finite division ring is commutative. We apply the concept of Frobenius kernel in Frobenius representation theorem in finite group theory to build a proof.
متن کاملThe Position Occupied by Persian Translations of English Modern Short Stories in Persian Literary Polysystem
Literatures of various cultures interfere with one another so that each of them may become part of another’s literary polysystem. Accordingly, the researchers, in this study, attempted to recognize what position Persian literary polysystem allowed English literature in particular Persian translations of English modern short stories to occupy during 1990-2005. This study also intended to find ou...
متن کاملReproducing Kernel Space Hilbert Method for Solving Generalized Burgers Equation
In this paper, we present a new method for solving Reproducing Kernel Space (RKS) theory, and iterative algorithm for solving Generalized Burgers Equation (GBE) is presented. The analytical solution is shown in a series in a RKS, and the approximate solution u(x,t) is constructed by truncating the series. The convergence of u(x,t) to the analytical solution is also proved.
متن کاملSeparating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کامل